=encoding utf-8

=head1 NAME

XRCU is a library that provides efficient lock-less synchronization for read-mostly tasks and structures

=head1 ABOUT THIS DOCUMENT

This file is meant to document the XRCU library, including internal details
and the rationale behind the decisions taken.

This document assumes that the reader is at least somewhat familiar with such
concepts as multi-threading, lock-free algorithms and data structures, and
read-copy-update (RCU).

=head1 ABOUT LOCK FREE ALGORITHMS AND STRUCTURES

The term I<lock free> is used to refer to several different things depending
on the author, but the basic idea is that an algorithm or structure is lock
free if multiple threads can operate on it without the end result being
dependent on said threads' scheduling. That is, it doesn't matter if the
involved threads are suspended or preempted; progress is guaranteed.

The usually mentioned advantage of lock freedom is performance. When you are
using lock free algorithms, you naturally avoid using things like mutexes,
which can have a non-negligible performance impact. On the other hand, other
people point out that lock free algorithms are typically harder to understand,
that debugging is notoriously difficult, and that the benefits generally do
not compensate for these complexities.

The reason why this library was developed is to give other programmers another
option when designing high performant, low latency systems. We do not claim
that lock freedom is a silver bullet when it comes to making programs faster.
However, it's also hard to argue that lock contention tends to be a real
bottleneck, and several projects have moved towards lock free structures
given the performance benefits they bring (see for example, many kernels).

=head1 ABOUT RCU

There is a big problem with using lock free algorithms in programming languages
that feature manual memory management, and it has to do with memory reclamation.
Since lock freedom implies that multiple threads may operate on a given
structure concurrently, it is possible to find ourselves in a situation in which
a thread is reading some data, while another one is deleting that same piece of
data, potentially freeing the memory associated to it.

As such, lock freedom needs an additional twist in languages like C and C++.
We need an additional subsystem that allows us to synchronize reclamation
without using heavyweight methods like mutexes (we are developing a lock-free
library, after all).

This is where RCU comes in. RCU, short for read-copy-update, is a mechanism
that allows multiple threads to I<read> from memory, while postponing
reclamations until it is safe to do so (i.e: until no readers are operating
on said memory). RCU typically assumes that updates are performed using atomic
instructions, and isn't concerned with them (At least, that's the case in this
library).

There are several different ways to implement RCU, and the one chosen in this
library will be detailed later. For now, it's important to point out that RCU
typically makes reading from shared memory a very cheap operation, while
imposing more overhead on updates. Thus, it's more suited for tasks in which
reads are more frequent than modifications.

It should be noted that RCU is not the only way to manage memory reclamations
in a multi-threaded environment, but as its name implies, XRCU chose it because
it was deemed the best option. Since our lock-free structures rely on RCU to
work properly, we'll describe the RCU interface that this library exposes first.

=head2 API design

All public interfaces (functions, types, methods, etc) reside in the namespace
C<xrcu>. It's the explicit intent of the authors to mantain compatibility with
every interface that was ever exposed. Any breaking changes should be done in
different interfaces (either by adding new functions, or by overloading).

The implementation of some template types require additional, internal details,
and those usually reside in a namespace appropriately named C<detail> (Said
namespace is inside C<xrcu>). It goes without saying that users should I<not>
rely on those internal details, and that the authors may freely break
compatibility with them.

XRCU uses some thread-specific data in order to function properly. There may
be some systems that have issues when dynamically loading libraries that use
thread-specific data (I believe Windows Vista was among them). Consult with
your working environment's manual to verify that if you notice any problem.

Finally, it should be mentioned that XRCU requires no initialization or cleanup
at a library level to function properly. Once the library is installed, you
can freely use its API without any further action needed.

Note that XRCU may expose more functions or types that are specified in this
document. This is usually done to avoid code bloat. Anything that isn't
included in this file should be treated as private and subject to change.

=head2 RCU API

  #include <xrcu/xrcu.hpp>

The following section documents the RCU API that is exposed in XRCU. It's used
internally quite a bit by other parts of the library, yet it's very useful by
itself when implementing other concurrent algorithms or data structures.

=head3 RCU critical section management

RCU is based on the concept of I<critical sections>, code fragments that are
executed under the guarantee that no reclamation can take place. This allows
the user to safely read from shared memory, knowing that it will be valid
during the critical section. Writes have to be synchronized independently
of RCU, however.

Naturally, RCU critical sections don't prevent I<all> memory reclamation,
they only affect the destruction of objects that are derived from a type
defined in XRCU, called C<finalizable>. If you wish to use the RCU API, all
you need to do is make your types derive from C<finalizable> and use its API,
and you'll be set. Any other memory management functions, like C<malloc> and
friends are unaffected by RCU as implemented in this library.

Critical sections are managed through the following API:

=over 4

=item void enter_cs (void);

Makes the calling thread enter a critical section. Critical sections can be
safely nested without problem. As long as we are in one, C<finalizable> objects
cannot be reclaimed.

=item void exit_cs (void);

Exits a critical section. This is generally called after the calling thread
is done reading from shared memory, and wants to signal that reclamations
should be re-allowed. The effects of calling C<exit_cs> when the calling thread
is not in a critical section are undefined.

=item bool in_cs (void);

Returns true if the calling thread is in a critical section; false otherwise.

=item bool sync (void);

Waits until all threads are outside pre-existing critical sections, and returns
true afterwards. If a deadlock is detected (because the calling thread is in a
critical section, for example), this function returns false immediately
without waiting.

=item struct cs_guard

This type is defined such that its constructor calls C<enter_cs>, and its
destructor calls C<exit_cs>; it has no internal state. As its name implies,
it's useful as a guard to manage entering and exiting a critical section in
a way that is exception safe. A few types in this library are derived from
C<cs_guard>, such as iterators, since they need to examine potentially many
elements from a container without their memory being reclaimed.

=back

=head3 RCU finalizable objects

As it was mentioned before, XRCU defines a type called C<finalizable> that
is specifically designed to make its destruction safe (i.e: Only once all
threads are outside a critical section). This type defines the following
interface:

  struct finalizable
    {
      virtual void safe_destroy ();
      virtual ~finalizable ();
    };

Under most circumstances, it's enough for a user-defined type to derive from
C<finalizable> and leave it at that. However, the above 2 methods are provided
as virtual for customization's sake. When a C<finalizable> object is reclaimed
by the RCU subsystem, it will call the C<safe_destroy> method. The default
implementation simply calls the object's destructor and frees the memory
associated to it. If, for whatever reason, a user wants to override this
behaviour, they may do so by extending either of those methods.

In addition, the following API is available when dealing with finalizables:

=over 4

=item void finalize (finalizable *F);

Adds the object C<F> to the calling thread's list of pending finalizable objects.
Each thread has a limit on the number of pending C<finalizables>. Once that
limit is reached, they are scheduled for reclamation, and will be collected
once it's safe to do so.

Only a single call to C<finalize> is allowed on a particular object. If this
function is called more than once on the same object (Either by the same
thread, or another), the behaviour is undefined.

=item bool flush_finalizers (void);

Schedule all the calling thread's accumulated C<finalizable> objects for
reclamation immediately. Returns true if successful, false if a deadlock was
detected. Note that this call may block until all threads are outside a
critical section (Much as a call to C<sync> would).

Once a thread exits, an implicit call to this function is made.

=back

=head3 Miscellaneous functions

These functions don't really belong anywhere else, but they are included in
this file for convenience's sake:

    struct atfork
      {
        void (*prepare) (void);
        void (*parent) (void);
        void (*child) (void);
      };

=over 4

=item atfork atfork_data ();

Returns a structure that implements the needed callbacks for XRCU to mantain a
consistent state across calls to C<fork>. The returned callbacks should be
passed to C<pthread_atfork> (or similar). XRCU itself does not install these
callbacks, because most calls to C<fork> are followed immediately after C<exec>,
and because multi-threaded programs that call C<fork> are rare.

=item void library_version (int& major, int& minor);

Returns the library version as a pair of C<major>, C<minor>. Useful to assert
that a program is using the correct version (at runtime).

=back

=head3 Implementation details

There are many ways to implement RCU. In this library, it was fundamental that
RCU be implemented in a portable, simple way, even if it meant slightly more
overhead. As such, things like signals and OS-specific syscalls were out.

In the end, for XRCU, we decided to use a global registry that uses a stamp
to monitor which threads are in critical sections. By using C++ C<thread_local>
objects, we avoid the need for explicit registration, so that when a thread
first uses the RCU API, it is added automatically to the global registry.

In order to enter a critical section, all a thread has to do is read a global
counter, usually bump it by some small value, and then store that value with
release semantics in its thread-specific data. Exiting a critical section is
almost entirely symmetric (We decrement the thread-specific value), but with a
small caveat that will be explained below.

When an object is finalized, it's prepended to a singly-linked list that is
also kept in thread-specific storage. Once a certain number of them have been
accumulated (specified by the constant C<XRCU_MAX_FINS>), they are scheduled to
be reclaimed. However, if the calling thread is inside a critical section at
that point, a special flag is set instead, that tells the thread to immediately
flush its C<finalizable> objects once it's outside the critical section.

In this implementation, the most expensive operation is undoubtedly C<sync>.
It works by locking the global registry, then checking if any thread is in a
critical section, and sleeping for short periods of time in case there are.
The overhead associated to C<sync> is the main reason why critical sections
should be short, and also why C<finalizable> objects are accumulated instead of
being reclaimed right away.

=head2 Interlude: optionals

    #include <xrcu/optional.hpp>

Before moving on to the topic of containers, we need to take a look at an
auxiliary structure implemented to make things easier and more convenient
for users. This is the C<optional> type.

An optional is a template type that can hold either an object of type I<T>, or
be in an uninitialized state. In other words, the value may or may not be
present. Unlike the usage of a pointer which may be null, an C<optional> never
performs dynamic memory allocation, as the space required to hold the value is
always there, even if it's not being used.

Optional types have been standardized in C++17, and are present in many other
libraries as well. However, since XRCU aims to work with the base minimum of
C++11 compliant compilers, and because it's designed not to depend on other
libraries or frameworks, it contains its own (lightweight) implementation.

=head3 Optional API

An optional is defined as such:

    template <class T>
    struct optional
      {
      };

And its public interface is the following:

=over 4

=item optional ();

Default constructor. Initializes the optional without a value.

=item optional (const T& value);

Initializes the optional to contain C<value>.

=item optional (const optional<T>& other);

Copy constructor. If C<other> has a value, then it initializes the optional to
contain that same value. Otherwise, the optional will contain no value.

=item optional (T&& value);

Move constructor. Initializes the optional by taking ownership of C<value>.

=item optional (optional<T>&& other);

Move constructor. If C<other> has a value, then the optional is initialized
by taking ownership of it. Otherwise, the optional will have no value.

=item T& operator* ();

=item const T& operator* () const;

Returns a (possibly const) reference to the optional's value. The results are
undefined if the optional does not contain one.

=item T* operator-> ();

=item const T* operator-> () const;

Returns a (possibly const) pointer to the optional's value. The results are
undefined if the optional does not contain one.

=item bool has_value () const;

Returns true if the optional has a value.

=item void reset () const;

If the optional contains a value, call its destructor and make the optional
contain no value afterwards. Otherwise, there are no effects.

=item optional& operator= (const optional<T>& other);

If C<other> has a value, assigns it to the optional. Otherwise, calls C<reset>
on the optional and leaves it without a value. Returns C<*this>.

=item optional& operator= (const T& value);

Assigns C<value> to the optional. Returns C<*this>.

=item optional& operator= (optional<T>&& other);

If C<other> has a value, assigns it to the optional by moving it. Otherwise,
calls C<reset> on the optional and leaves it without a value. Returns C<*this>.

=item optional& operator= (T&& value);

Assigns C<value> to the optional by moving it. Returns C<*this>.

=item ~optional ();

Destroys the value associated to the optional, if it had any.

=back

=head3 Implementation details

Optionals are rather simple: They are implemented by using a flat buffer
on which placement new is called. Once an object has been constructed,
a pointer is cached for the buffer. This pointer will be used when fetching
the value, or destroying the object (it's null for empty optionals).

Optionals are very useful when performing lookups in mapped containers, since
they allow us to bypass the need of additional output parameters to determine
if a search was successful in an atomic way.

Another reason as to why optional values are used so extensively in XRCU is
because we consider them to be the best way to signal both a valid object,
as well as an error. When using lock-less data structures, it's impossible to
determine beforehand whether an error will occur when calling a method (because
multithreading makes things non deterministic). Throwing an exception was
considered a bit too "punishing" - it's not precisely a programming error,
after all.

=head2 Stacks

  #include <xrcu/stack.hpp>

Stacks are the simplest of the lock-free data structures: They represent a
basic LIFO container on which you can push and pop items at the back of the
stack. Although their functionality is a bit more limited than other structures,
they are still very useful to implement things like atomic free lists.

In XRCU, stacks meet the requirements for the C++ concept of I<Container>.

=head3 Stack API

Stacks are templated types that can be instantiated with a type T:

  template <class T>
  struct stack
    {
      typedef T value_type;
      typedef T& reference;
      typedef const T& const_reference;
      typedef T* pointer;
      typedef const T* const_pointer;
      typedef ptrdiff_t difference_type;
      typedef size_t size_type;

      struct iterator;
      struct const_iterator;
    };

In C++'s parlance, a stack iterator is a I<forward> iterator, and it may be
used in any function that accepts them.

A stack iterator is a C<cs_guard>, which means that an implicit critical
section is entered when one is created, and is extended until its destruction.

The following describes the public interface for C<stack<TE<gt>>

=over 4

=item stack ();

Default constructor. Initializes a stack to be empty.

=item template <class Integer> stack (Integer n, T value);

Initializes a stack to contain C<n> times C<value>.

=item template <class Iter> stack (Iter first, Iter last);

Initializes a stack with the values in the range [C<first>, C<last>)

=item stack (std::initializer_list<T> lst);

Initializes a stack with the values in C<lst>.

=item stack (const stack<T>& other);

Copy constructor. Initializes a stack with the values in C<other>.

=item stack (stack<T>&& other);

Move constructor. Takes ownership of the values in C<other>.

=item void push (const T& value);

Pushes C<value> to the top of the stack.

=item template <class Iter> void push (Iter first, Iter last)

Pushes the values in the range [C<first>, C<last>) to the stack.

=item template <class Integer> void push (Integer n, T value)

Pushes C<n> times C<value> to the stack.

=item template<class ...Args> void emplace (Args&& ...args)

Same as C<push>, only it constructs a new item by calling the move
constructor with C<args...>.

=item optional<T> pop ();

Removes the item at the top of the stack, and returns an optional with that
value. If the stack is empty, the method returns an optional with no value.

=item optional<T> top ();

Fetches the current item at the top of the stack, and returns an optional
with it as its value. If the stack is empty, returns an optional with no value.

=item size_type size () const;

Returns the current size of the stack.

=item size_type max_size () const;

Returns the maximum allowed size for the stack.

=item bool empty () const;

Returns true if the stack is empty (equivalent to C<size () == 0>).

=item void swap (stack<T>& other);

Swaps the contents of the stack with C<other>, but only if C<other> is not the
same object.

=item stack<T>& operator= (const stack<T>& other);

Assigns the contents of C<other> to the stack and returns C<*this>.

=item stack<T>& operator= (stack<T>&& other);

Move assignment. Takes ownership of the elements in C<other>. Returns C<*this>.

=item bool operator== (const stack<T>& other) const;

Compares the elements of the stack with the ones in C<other>. Returns true
if they are all equal.

=item bool operator!= (const stack<T>& other) const;

Compares the elements of the stack with the ones in C<other>. Returns true
if any two of them are not equal.

=item bool operator<  (const stack<T>& other) const;

=item bool operator>  (const stack<T>& other) const;

=item bool operator<= (const stack<T>& other) const;

=item bool operator>= (const stack<T>& other) const;

Lexicographically compares the elements of the stack with the ones in C<other>,
in a way that is equivalent to calling C<std::lexicographical_compare>.

=item void clear ();

Removes every element from the stack.

=item template <class T1, class T2> void assign (T1 x, T2 y);

Assigns to the stack the elements described by (C<x>, C<y>). They could be an
iterator range, or a pair of (integer, value), as is the case with the stack's
constructor.

=item void assign (std::initializer_list<T> lst);

Assigns to the stack the items in C<lst>.

=item iterator begin ();

=item const_iterator cbegin () const;

Returns an iterator to the first element of the stack.

=item iterator end ();

=item const_iterator cend () const;

Returns an iterator one past the last element of the stack.

=item iterator::iterator ();

Default constructor for stack iterators. Leaves the object in an invalid state.
Dereferencing or incrementing such an iterator has undefined behavior.

=item iterator::iterator (const iterator& other);

Copy constructor. Initializes the iterator to be equal to C<other>.

=item T& iterator::operator* ();

=item const T& const_iterator::operator* () const;

Returns a (possibly const) reference to the iterator's underlying object.

=item T* iterator::operator-> ();

=item const T* const_iterator::operator-> () const;

Returns a (possibly const) pointer to the iterator's underlying object.

=item iterator& iterator::operator++ ();

Pre-increment operator. Moves the iterator forward, and returns it. If the
iterator was at the end of the stack, the results are undefined.

=item iterator iterator::operator++ (int);

Post-increment operator. Moves the iterator forward, and returns an iterator
equal to what it was before incrementing. If the iterator was at the end of
the stack, the results are undefined.

=item bool iterator::operator== (const iterator& other);

=item bool iterator::operator!= (const iterator& other);

Tests for iterator (in)equality.

=back

=head3 Implementation details

There really isn't much to say about the internal details of the stack. It's
probably the easiest lock free structure to implement. As with most other
implementations, the one in XRCU simply consists of an atomic pointer to the
top node. The use of RCU prevents the biggest issue with this design, that is,
the ABA problem.

The only noteworthy thing to point out is that the C<swap> method is safe to
call from multiple threads as well. In order to achive this, the library uses
a special I<sentinel> bit that is temporarily set at the head node when a
swap is undergoing. During a swap, C<push> and C<pop> cannot proceed, since
they check that this special bit is not set before modifying the stack.

Note that because stack iterators are implicit critical sections, and because
of stacks' implementation, iterating a stack will always be safe, even in the
presence of operations like C<push> and C<pop>. The only way for an iterator
to be invalidated is it's at the beggining of the stack, and a call to C<pop>
is made. Even then, dereferencing the iterator is valid, but advancing it will
end prematurely, since the object was unlinked from the stack.

=head2 Queues

    #include <xrcu/queue.hpp>

This is a multi-producer, multi-consumer, FIFO queue. Much like the stack,
elements can be pushed or popped from it, but the order is different: They
will be retrieved in the same order they were inserted.

In XRCU, queues meet the requirements for the C++ concept of I<Container>.

=head3 Queue API

Queues are templated types, and they may be instantiated with a type T:

  template <class T>
  struct queue
    {
      typedef T value_type;
      typedef T& reference;
      typedef const T& const_reference;
      typedef T* pointer;
      typedef const T* const_pointer;
      typedef ptrdiff_t difference_type;
      typedef size_t size_type;

      struct iterator;
      struct const_iterator;
    };

As with the stack, a queue's iterator can be considered a I<forward> iterator,
and it's also a C<cs_guard>, so that a queue may be examined by an iterator,
at any time.

The following describes the public interface for C<queue<TE<gt>>

=over 4

=item queue ();

Default constructor. Initializes a queue to be empty.

=item template <class Integer> queue (Integer n, T value);

Initializes a queue to contain C<n> times C<value>.

=item template <class Iter> queue (Iter first, Iter last);

Initializes a queue with the values in the range [C<first>, C<last>).

=item queue (std::initializer_list<T> lst);

Initializes a queue with the values in C<lst>.

=item queue (const queue<T>& other);

Copy constructor. Initializes a queue with the values in C<other>.

=item queue (queue<T>&& other);

Move constructor. Takes ownership of the values in C<other>.

=item void push (const T& value);

Pushes C<value> to the top of the queue.

=item template <class ...Args> void emplace (Args&& ...args)

Same as C<push>, only the new item is constructed by calling the move
constructor with C<args...>.

=item optional<T> pop ();

Removes the first item from the queue, and returns an optional with that value.
If the queue is empty, the method returns an optional with no value.

=item optional<T> front () const;

Returns an optional with the first value from the queue. If the queue is empty,
an optional with no value is returned instead.

=item optional<T> back () const;

Returns an optional with the last value from the queue. If the queue is empty,
an optional with no value is returned instead.

=item size_type size () const;

Returns the size of the queue.

=item size_type max_size () const;

Returns the maximum allowed size for the queue.

=item void swap (queue<T>& other);

Swaps the contents of the queue with C<other>, but only if C<other> is not
the same object.

=item queue<T>& operator= (const queue<T>& other);

Assigns the contents of C<other> to the queue and returns C<*this>.

=item queue<T>& operator= (queue<T>&& other);

Move assignment. Takes ownership of the elements in C<other>. Returns C<*this>.

=item bool operator== (const queue<T>& other);

Compares the elements of the queue with the ones in C<other>. Returns true
if they are all equal.

=item bool operator!= (const queue<T>& other);

Compares the elements of the queue with the ones in C<other>. Returns true
if any two of them are not equal.

=item bool operator<  (const queue<T>& other) const;

=item bool operator>  (const queue<T>& other) const;

=item bool operator<= (const queue<T>& other) const;

=item bool operator>= (const queue<T>& other) const;

Lexicographically compares the elements of the queue with the ones in C<other>,
in a way that is equivalent to calling C<std::lexicographical_compare>.

=item void clear ();

Removes every element from the queue.

=item template <class T1, class T2> void assign (T1 x, T2 y);

Assigns to the queue the elements described by (C<x>, C<y>). They can be an
iterator range or a pair of (integer, value), as is the case with the queue's
constructor.

=item void assign (std::initializer_list<T> lst);

Assigns to the queue the elements in C<lst>.

=item iterator begin ();

=item const_iterator cbegin () const;

Returns an iterator to the first element of the queue.

=item iterator end ();

=item const_iterator cend () const;

Returns an iterator one past the last element of the queue.

=item iterator::iterator ();

Default constructor for queue iterators. Leaves the object in an invalid state.
Dereferencing or incrementing such an iterator has undefined behavior.

=item iterator::iterator (const iterator& other);

Copy constructor. Initializes the iterator to be equal to C<other>.

=item const T& iterator::operator* ();

=item const T& const_iterator::operator* () const;

Returns a const reference to the iterator's underlying object.

=item iterator& iterator::operator++ ();

Pre-increment operator. Moves the iterator forward, and returns it. If the
iterator was at the end of the stack, the results are undefined.

=item iterator iterator::operator++ (int);

Post-increment operator. Moves the iterator forward, and returns an iterator
equal to what it was before incrementing. If the iterator was at the end of
the stack, the results are undefined.

=item bool iterator::operator== (const iterator& other);

=item bool iterator::operator!= (const iterator& other);

Tests for iterator (in)equality.

=back

=head3 Implementation details

The design of the multi-producer, multi-consumer queue in XRCU is rather
simple and different from other implementations. It consists of a single array
that holds either the elements themselvers, or pointers to them, a decision
taken at compile time via type traits.

In addition to the vector, the queue is composed of three additional counters:
the capacity, a read cursor and a write cursor. The capacity is static, and
simply stores the raw number of elements that the queue can hold before it's
forced to reallocate. The read and write cursors are indices that indicate
at which positions the C<pop> and C<push> (or C<emplace>) operations may take
place.

When an element is pushed into a queue, the write cursor is checked against
the capacity. If the queue is full, then it must be reallocated. This is done
by marking each element with a special bit, allocating a new queue and then
copying over the elements. If there's room in the queue, the element is
atomically set via compare-and-swap, and if successful, the write cursor is
also updated atomically.

Popping elements proceeds in a similar way. The read cursor is compared against
the write cursor, and returns prematurely with failure if there are no more
elements to pop. Otherwise, it's atomically replaced with a special marker,
after which the read cursor is updated.

=head2 Skip lists

    #include <xrcu/skip_list.hpp>

A skip list is an associative container that holds a sorted set of unique
objects of a particular type. In addition to the I<Key> type, skip lists are
instantiated with a comparator type that allows them to determine ordering.

In XRCU, skip lists meet the requirements for the C++ concept of I<Container>,
and I<AssociativeContainer>.

=head3 Skip list API

Skip lists are templated types, defined in the following way:

    template <class T, class Cmp = std::less<T> >
    struct skip_list
      {
        typedef T value_type;
        typedef T key_type;
        typedef Cmp key_compare;
        typedef Cmp value_compare;
        typedef T& reference;
        typedef const T& const_reference;
        typedef T* pointer;
        typedef const T* const_pointer;
        typedef ptrdiff_t difference_type;
        typedef size_t size_type;

        struct iterator;
        struct const_iterator;
      };

The template parameter C<T> refers to the key type, whereas C<Cmp> is the
comparator type, which defaults to C<std::less>. Under most circumstances,
that is usually enough, but users may instantiate with any other type that
defines the C<operator()> which returns a boolean.

As shown above, skip_list has iterators, but they are always constant, for
practicality reasons. Much like with other containers, skip list iterators are
I<forward> iterators, and an implicit C<cs_guard>.

The following describes the public interface for C<skip_list<T, CmpE<gt>>

=over 4

=item skip_list (Cmp c = Cmp (), unsigned int depth = ...);

Initializes the skip list with comparator C<c>, which defaults to a default
constructed value. Also takes a parameter indicating the maximum depth a skip
list node may have, which defaults to an implementation-specified value. Users
shouldn't need to change the latter, but it can be useful when tuning the
application for performance. As a general rule, a higher value implies more
memory usage, but better performance.

=item template <class Iter> skip_list (Iter first, Iter last, Cmp c = Cmp (), unsigned int depth = ...);

Initializes the skip list to the values in [C<first>, C<last>). The comparator
and depth parameters are the same as explained above.

=item skip_list (std::initializer_list<T> lst, Cmp c = Cmp (), unsigned int depth = ...);

Initializes the skip list to the values in C<lst>. The comparator and depth
parameters are the same as explained above.

=item skip_list (const skip_list<T, Cmp>& other);

Copy constructor. Initializes the skip list to hold the values in C<other>.

=item skip_list (skip_list<T, Cmp>&& other);

Move constructor. Takes ownership of the values in C<other>.

=item optional<T> find (const T& key) const;

Searches for an element equivalent to C<key> in the skip list, and returns an
optional with it as its value. If the key couldn't be found, the returned
optional has no value.

=item bool contains (const T& key) const;

Returns true if C<key> is present in the skip list.

=item bool insert (const T& key) const;

Inserts C<key> in the skip list. Returns true if the key wasn't present
previous to this call.

=item bool erase (const T& key) const;

Erases C<key> from the skip list. Returns true if the key was present previous
to this call.

=item optional<T> remove (const T& key);

Erases C<key> from the skip list and returns an optional with that element as
its value, if the key was present. Otherwise, the returned optional has no
value.

=item iterator begin ();

=item const_iterator cbegin () const;

Returns an iterator to the first element of the skip list.

=item iterator end ();

=item const_iterator cend () const;

Returns an iterator one past the last element of the skip list.

=item size_type size () const;

Returns the current size of the skip list.

=item size_type max_size () const;

Returns the maximum allowed size for a skip list.

=item bool empty () const;

Returns true if the skip list is empty (i.e: its size is 0).

=item template <class Iter> void assign (Iter first, Iter last);

Assigns to the skip list the elements in [C<first>, C<last>).

=item void assign (std::initializer_list<T> lst);

Assigns to the skip list the elements in C<lst>.

=item skip_list& operator= (const skip_list<T, Cmp>& other);

Assigns the elements in C<other> to the skip list. Returns C<*this>.

=item skip_list& operator= (skip_list<T, Cmp>&& other);

Move assignment. Takes ownership of the elements in C<other>. Returns C<*this>.

=item void swap (skip_list<T, Cmp>& other);

Swaps the contents of the skip list with C<other>, but only if C<other> is a
different object.

=item void clear ();

Removes every element from the skip list.

=back

=head3 Implementation details

In XRCU, skip lists are implemented as described in any piece of literature
that talks about them. Basically, every skip list of depth I<D> has a head
node that can be linked with up to other I<D> nodes. When performing a lookup
for an element, we start at the head node, and move horizontally until the
current element is equal or greater. If it's equal, we the lookup succeeded.
Otherwise, we move vertically to the next node, until we either find the
element, or we exhausted every node.

Insertions first perform a lookup, and if the search failed, we create a new
node with the specified value as its key. The number of linked nodes it will
have is determined in a semi-random way, but it must never exceed the maximum
depth of the skip list. After the node is created, we try to atomically link it
with its computed predecesors and successors (Returned in the same lookup call
we did before). If we succeed, the node is considered part of the skip list;
otherwise, we free it and retry.

Erasing an element proceeds in a similar fashion, only we check that the lookup
I<did not> fail, and then atomically relink the node's predecessors with the
node's own successors, thereby removing the key from the skip list.

Skip lists mantain an atomic word to store the length. However, this is used
slightly differently as one would think, because the lowest bit is used as an
ad-hoc lock bit in order to serialize concurrent calls to C<swap>.

=head2 Hash tables

    #include <xrcu/hash_table.hpp>

Hash tables are associative containers that map unique keys to values. Ordering
is unspecified for both keys and values. In addition to the key and value types,
hash tables are instantiated with a hashing type and an equality type;
callables that compute a I<hash value> for a given key, and one that tests for
equality, given two keys, respectively.

Additionally, hash tables mantain a I<load factor> that determines when a full
rehash is performed. A rehash implies moving all the key-value pairs into a new
location to reduce the overhead of most operations.

In XRCU, hash tables meet the requirements for the C++ concept of I<Container>
and I<UnorderedAssociativeContainer>.

=head3 Hash table API

As mentioned above, hash tables are template types, defined like this:

    template <class Key, class Val,
              class Equal = std::equal<Key>,
              class Hash = std::hash<Key> >
    struct hash_table
      {
        typedef Val mapped_type;
        typedef Key key_type;
        typedef std::pair<Key, Val> value_type;
        typedef Equal key_equal;
        typedef Hash hasher;
        typedef value_type& reference;
        typedef const value_type& const_reference;
        typedef value_type* pointer;
        typedef const value_type* const_pointer;
        typedef ptrdiff_t difference_type;
        typedef size_t size_type;

        struct iterator;
        struct const_iterator;
      };

The template parameters should be pretty self explanatory: They refer to the
key, value, equality and hashing types, in that order. The C<Equal> type has to
operate on two keys and return a boolean value that determines their equality,
whereas the C<Hash> type has to operate on keys and return unsigned integers.
Both are allowed to throw exceptions, although it is not really wise to do so.

Much like with skip lists, hash table iterators are always constant, and also
an implicit C<cs_guard>.

The following describes public interface for hash tables:

=over 4

=item hash_table (size_t size = 0, float lf = 0.85, Equal e = Equal (), Hash h = Hash ())

Initializes the hash table to hold C<size> elements, with a load factor of
C<lf>, using the equality predicate C<e> and the hasher C<h>. The load factor
must be in the range [0.4, 0.9]; if it's not, then it will silently be set to
the default of 0.85.

=item template <class It> hash_table (It first, It last, float lf = 0.85, Equal e = Equal (), Hash h = Hash ())

Initializes the hash table with the values between (C<first>, C<last>]. The
elements in that range must have two public members defined: C<first> and
C<second>, referring to the key and value of each element, respectively, and
in a similar fashion to what C<std::pair> does. The other parameters work as
with the previous constructor.

=item hash_table (std::initializer_list<std::pair<Key, Val> >, float lf = 0.85, Equal e = Equal (), Hash h = Hash ())

Initializes the hash table with the values in C<lst>. The rest of the parameters
work as with the previous constructors.

=item hash_table (const hash_table<Key, Val, Hash, Equal>& other);

Copy constructor. Initializes the hash table with the values in C<other>.

=item hash_table (hash_table<Key, Val, Hash, Equal>&& other);

Move constructor. Takes ownership of the values in C<other>.

=item size_t size () const;

Returns the hash table size.

=item size_t max_size () const;

Returns the maximum allowed size for a hash table.

=item bool empty () const;

Returns true if the hash table is empty.

=item optional<Val> find (const Key& key) const;

=item Val find (const Key& key, const Val& defl) const;

Look for C<key> in the hash table. If it's found, returns the value associated
to it. If it's not found, returns an empty optional in the first version, or
C<defl> in the second one.

=item bool contains (const Key& key) const;

Returns true if the key is present in the hash table.

=item bool insert (const Key& key, const Val& val);

Associates C<key> with C<val> in the hash table. Returns true if the value was
not present. Otherwise, the former value is replaced.

=item template <class Fn, class ...Args> bool update (const Key& key, Fn f, Args... args);

Updates the value associated to C<key> by calling C<f> with it and the rest of
the arguments. In other words, if C<val> is the value associated to C<key>,
calls C<f (val, args...)> and updates the value with the result. If no value was
present for C<key>, the function is called with a default constructed value
instead (as if by calling C<Val ()>).

Note that the function takes a I<reference> to the value, and so it's possible
to modify it in place. Additionally, if the function returns the very same
object that was passed (i.e: the same reference that it received), there won't
be any modifications made to the value in the table. If, however, the function
returns a different object, this call will need to atomically update the value.
The passed function may be called more than once if the atomic updates fail.

Returns true if C<key> was not present in the hash table before the call.

=item bool erase (const Key& key);

Erases C<key> from the hash table. Returns true if C<key> was present before
the call.

=item optional<Val> remove (const Key& key);

Removes C<key> from the hash table and returns the value that was associated
to if, if it was present.

=item void clear ();

Removes every element from the hash table.

=item template <class Iter> void assign (Iter first, Iter last);

Assigns the elements in [C<first>, C<last>) to the hash table. The same
restrictions as the range constructor apply here.

=item void assign (std::initializer_list<std::pair<Key, Val>> lst);

Assigns the elements in C<lst> to the hash table.

=item hash_table<Key, Val>& operator= (const hash_table<Key, Val>& other);

Assigns the elements in C<other> to the hash table. Returns C<*this>.

=item hash_table<Key, Val>& operator= (hash_table<Key, Val>&& other);

Move assignment. Takes ownership of the elements in C<other>. Returns C<*this>.

=item void swap (hash_table<Key, Val>& other);

Swaps the contents of the hash table with C<other>, but only if C<other> is a
different object.

=item iterator begin ();

=item const_iterator cbegin () const;

Returns an iterator to the first element of the hash table.

=item iterator end ();

=item const_iterator cend () const;

Returns an iterator one past the last element of the hash table.

=item iterator::iterator ();

Default constructor for hash table iterators. Leaves the object in an invalid
state. Dereferencing or incrementing such an iterator has undefined behavior.

=item iterator::iterator (const iterator& other);

Copy constructor. Initializes the iterator to be equal to C<other>.

=item Key iterator::key () const;

Returns the key for the iterator.

=item Val iterator::value () const;

Returns the value for the iterator.

=item std::pair<Key, Val> iterator::operator* () const;

Returns an C<std::pair> constructed with the key and value of the iterator.

=item iterator& iterator::operator++ ();

Pre-increment operator. Moves the iterator forward and returns it. If the
iterator was at the end, the results are undefined.

=item iterator iterator::operator++ (int);

Post-increment operator. Moves the iterator forward, and returns an iterator
equal to what it was before incrementing. If the iterator was at the end, the
results are undefined.

=item bool iterator::operator== (const iterator& other);

=item bool iterator::operator!= (const iterator& other);

Test for iterator (in)equality.

=back

=head3 Implementation details

Hash tables are somewhat complex, because the atomicity requirements force us
to do some rather convoluted things. To start off, a hash table is essentially
a vector of consecutive C<key> and C<value> pairs, with some special values
indicating C<free> and C<deleted> entries. However, since we can only operate
atomically on integers, we wrap any other type that is not integral into a
dynamically allocated pointer. This is done based on the template instantiation
and is figured out at compile time.

When a lookup is performed, the passed C<key> is hashed with the table's own
internal hasher and a hash code is computed. Using this code, we produce an
index into the vector and then we iterate over it, looking for a key that
matches, or until we find a free entry, in which case the lookup fails.

Insertions are a bit more complicated. We start by performing a lookup, as
described above, and then determine what to do based on the result: If the
lookup came up empty, then we need to insert both the key I<and> the value;
otherwise, we only need to update the value.

If only the value needs to be updated (because the key is already present),
then the insertion simply consists of atomically swapping the old value with
the new one, and, if succesful, finalizing the old value as well.

If we need to insert the key as well, first we decrement an internal counter
that keeps track of how many additional elements we can insert before a rehash
is triggered. If the counter has already reached zero, then we have to rehash
before inserting the key and value. If no rehash is needed, or after it's done,
both the key and value are updated atomically (Some platforms allow us to do it
in a single step, otherwise the operations are done sequentially).

At this point, we have to pause to make a sort of confession: Hash tables as
implemented in XRCU are not I<entirely> lock free, because rehashing actually
takes an internal lock (I know, you're crushed). This not because of any
intrinsical limitation, but rather, to make things easier. Since rehashes are
probably the least common of the operations, we figured it wasn't that big of
a deal anyways.

Going back to rehashes, these are the most expensive operations. As mentioned,
they start off by acquiring an internal lock, and then they iterate over the
full vector, marking the C<value> indexes with a special bit that signals that
the table is being rehashed. When this bit is set, insertions and erasures are
forbidden (They both check for this bit before proceeding).

Erasures are pretty simple in comparison. They obviously perform a lookup on
the key, and if it didn't come up empty, they atomically swap out the value
for the special C<empty> constant. Afterwards, they can mutate the key entry
without atomicity (Because erased entries cannot be reused).

=head1 BUGS

All implemented containers use standard operators C<new> and C<delete> to
perform memory (de)allocations. There's no way to specify custom allocators
yet, although it's planned in the future.

=head1 AUTHORS

Agustina Arzille (avarzille@riseup.net) and Luciano Lo Giudice (lmlogiudice@gmail.com)
